General and nested Wiberg minimization

ABSTRACT

Wiberg minimization operates on a system with two sets of variables described by a linear function and in which some data or observations are missing. The disclosure generalizes Wiberg minimization, solving for a function that is nonlinear in both sets of variables, U and V, iteratively. In one embodiment, defining a first function ƒ(U, V) that may be defined that may be nonlinear in both a first set of variables U and a second set of variables V. A first function ƒ(U, V) may be transformed into ƒ(U, V(U)). First assumed values of the first set of variables U may be assigned. The second set of variables V may be iteratively estimated based upon the transformed first function ƒ(U, V(U)) and the assumed values of the first set of variables U such that ƒ(U, V(U)) may be minimized with respect to V. New estimates of the first set of variables U may be iteratively computed.

BACKGROUND

Known Wiberg minimization minimizes a function of two sets of variables,which is linear in at least one of the sets of variables. Examples of afunction to be minimized include (1) the L₁norm of a vector function;(2) the L₂norm of a vector function; and (3) the negative likelihood ofa set of observations. The disclosure generalizes Wiberg minimization tofunctions that are nonlinear in both sets of variables.

Wiberg demonstrated an L₂ factorization method for matrices with missingdata that solved for one set of variables V in terms of the other set ofvariables U. It linearizes V about U and then minimizes with respect toU only. Subsequent work showed Wiberg's method had better convergencethan using Levenberg-Marquardt to minimize with respect to U and Vsimultaneously. It has also been shown that Wiberg's approach may beadapted for L₁ matrix factorization using linear programming. Thismethod outperformed an alternating convex programming method for L₁factorization, establishing a new state-of-the-art technique for solvingthat type of problem.

Wiberg's L₂ method is an application of more general work on separablenonlinear minimization that uses the idea of solving for V in terms of Uand then minimizing with respect to U only. A high-level approach tominimizing a general function in this manner has been described where Vbreaks down into small independent problems given U. But this approachfocused on the specific case of nonlinear least squares, where theobjective is linear in V. A separable method for maximum likelihoodestimation for a nonlinear least squares problem linear in V has alsobeen previously described. The Wiberg approach contrasts with methodsthat simply alternate between determining one set of unknowns whileholding the other fixed. While these alternating methods can sometimesconverge well, they can also fail to converge catastrophically and donot converge quadratically like second-order methods that minimize withrespect to all of the variables simultaneously.

The Wiberg approach to matrix factorization breaks a matrix Y intolow-rank factors U and V by solving for V given U in closed form,linearizing V(U) about U, and iteratively minimizing ∥Y−UV(U)∥₂ withrespect to U only. The Wiberg approach optimizes the same objectivewhile effectively removing V from the minimization, that is, Wibergminimizes a function with respect to one set of variables only.Recently, this approach has been extended to L₁, minimizing ∥Y−UV(U)∥₁.This L₁-Wiberg approach outperformed the previous state-of-the-art forL₁ factorization.

BRIEF SUMMARY

One embodiment of the disclosed subject matter can comprise defining afirst function ƒ(U, V) that may be nonlinear in both a first set ofvariables U and a second set of variables V. The first function ƒ(U, V)may be transformed into ƒ(U, V(U)) such that the members of the firstset of variables U may be independent variables and the members of thesecond set of variables V may be dependent on the first set ofvariables. First, assumed values of the first set of variables U may beassigned. The second set of variables V may be iteratively estimatedbased upon the transformed first function ƒ(U, V(U)) and the assumedvalues of the first set of variables U such that ƒ(U, V(U)) may beminimized with respect to V. New estimates of the first set of variablesU may be iteratively computed to minimize the transformed first functionƒ(U, V(U)) based upon the assumed values of the first set of variablesU, the calculated values of the second set of variables V, a derivativeof the second set of variables V with respect to the first set ofvariables U, and a total derivative of the transformed first functionƒ(U, V(U)) with respect to the members of the first set of variables U.The derivative may be computed using finite differences, analytically,or both using finite differences and analytically. In addition, a nestedminimization may be performed on any number of variables. A set ofvariables U₁, . . . , U_(n) may be defined. The first function ƒ(U, V)may be transformed into ƒ(U₁, V(U₁)), (U₂, V(U₁)), . . . , ƒ(U_(n),V(U_(n−1))), (U_(n+1), V(U_(n))) and minimized with respect to U_(n).The total derivative of the transformed first function may also becomputed automatically such that the nesting may be performed withrespect to n sets of variables.

The minimizing of the transformed first function ƒ(U, V(U)) based uponthe assumed values of the first set of variables U may compriseminimizing L₁ error, minimizing L₂ error, or a maximum likelihoodestimation. The L₁ error may be minimized using successive linearprogramming. The L₂ error may be minimized using the Gauss-Newtonmethod. The maximum likelihood estimation may be determined by theNewton-Raphson method. For each iteration, a different subset of U maybe selected and the transformed first function ƒ(U, V(U)) may beminimized with respect to the selected subset of the variables. A subsetof U may be selected to be a smallest subset of U that creates asolvable subproblem.

This method may be further extended to perform a nested minimization. Asecond set of variables V may be defined in terms of a third set ofvariables D and a second function ƒ(V, D) to be minimized may be definedthat is nonlinear in both a second set of variables V and the third setof variables D. The second function ƒ(V, D) may be transformed into ƒ(V,D(V)) such that the members of the second set of variables V may beindependent variables and the members of the third set of variables Dmay be dependent on the second set of variables V and that the firstfunction is now represented by ƒ(U, V(U), D(V, U)). Second assumedvalues of the second set of variables V may be assigned. The third setof variables D may be iteratively estimated based upon the transformedsecond function ƒ(V, D(V)) and the second assumed values of the secondset of variables V, such that ƒ(V, D(V)) may be minimized with respectto D. New estimates of the second set of variables V may be iterativelycomputed to minimize the transformed second function ƒ(V, D(V)) basedupon the assumed values of the second set of variables V and thecalculated values of the third set of variables D and a derivative ofthe third set of variables D with respect to the second set of variablesV, and a total derivative of the transformed second function ƒ(V, D(V))with respect to the members of the second set of variables V. Newestimates of the first set of variables U may be iteratively computed tomaximize the transformed first function ƒ(U, V(U), D(V, U)) based uponthe assumed values of the first set of variables U and the calculatedvalues of the second set of variables V and the calculated values of thethird set of variables D, the derivative of D with respect to V, thederivative of D with respect to U, and the derivative of V with respectto U, and the total derivative of the transformed first function ƒ(U,V(U), D(V, U)) with respect to the first set of variables U.

A plurality of images may be received from a camera. The plurality ofimages may comprise a sequence. One or more two-dimensional features maybe identified in each of a plurality of images in the received sequenceof images. A three-dimensional point associated with each of theidentified one or more two-dimensional features may be estimated. Eachof the one or more two-dimensional features may be tracked throughsuccessive images of the plurality of images. A two-dimensional imageerror may be iteratively minimized between the tracked each of the oneor more two-dimensional features and an image reprojection with respectto the three-dimensional point corresponding to the one or moretwo-dimensional features and a three-dimensional position of the cameracorresponding to one or more of the plurality of images. The camera mayhave no known features or at least one known feature. Examples offeatures may include focal length, image center, and radial distortion.The determination of the camera's position may be performed usinggeneral Wiberg minimization or nested Wiberg minimization in accordancewith embodiments of the subject matter disclosed herein. An L₁ norm ofthe image error may be minimized with respect to each of the at leastone known feature simultaneously using successive linear programming. AnL₁norm of the image error may be minimized with respect to each of theone or more unknowns simultaneously using successive linear programming.

In another embodiment of the disclosed subject matter, a set ofobservations may be received. The observations may comprise at least oneknown observation and at least one unknown observation. A first low rankfactor U and a second low rank factor V may be defined. An expressionthat describes the set of observations using low-rank factorization withrespect to the first low rank factor U and the second low rank factor Vmay be minimized. This minimization may generate an error. The errorgenerated by the minimizing of the expression may be minimized byminimizing each of the at least one unknown observation with respect toU and V simultaneously using successive linear programming. An estimatemay be obtained of each of the at least one unknown observation basedupon the step of minimizing the error.

Additional features, advantages, and embodiments of the disclosedsubject matter may be set forth or apparent from consideration of thefollowing detailed description, drawings, and claims. Moreover, it is tobe understood that both the foregoing summary and the following detaileddescription are exemplary and are intended to provide furtherexplanation without limiting the scope of the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are included to provide a furtherunderstanding of the disclosed subject matter, are incorporated in andconstitute a part of this specification. The drawings also illustrateembodiments of the disclosed subject matter and together with thedetailed description serve to explain the principles of embodiments ofthe disclosed subject matter. No attempt is made to show structuraldetails in more detail than may be necessary for a fundamentalunderstanding of the disclosed subject matter and various ways in whichit may be practiced.

FIG. 1 shows a computer according to an embodiment of the disclosedsubject matter.

FIG. 2 shows a network configuration according to an embodiment of thedisclosed subject matter.

FIG. 3 shows an example process for L₁ matrix factorization according toan embodiment of the disclosed subject matter.

FIG. 4 shows an example process for general Wiberg minimizationaccording to an embodiment of the disclosed subject matter.

FIG. 5 shows an example process for L₁-Wiberg bundle adjustmentaccording to an embodiment of the disclosed subject matter.

FIG. 6 shows an example process for nested Wiberg minimization accordingto an embodiment of the disclosed subject matter.

FIG. 7 is a histogram of the final residuals for the 100 syntheticmeasurement matrices for the Wiberg technique and for the techniquedisclosed in accordance with an embodiment of the disclosed subjectmatter.

FIG. 8 is a histogram of the number of iterations until convergence, forthe Wiberg technique and for the technique disclosed in accordance withan embodiment of the disclosed subject matter.

FIG. 9 is a graph showing the time for the linear programming method inone step of factorization as a function of problem size. Differentcurves give the times for different “number of rows in U”, for bothWiberg and for the technique in accordance with an embodiment of thedisclosed subject matter. Both axes are logarithmic.

FIG. 10 is a graph with the time in seconds for the linear programmingmethod in one step of bundle adjustment as a function of the number ofimages and points. Both axes are on a logarithmic scale.

FIG. 11 is a graph with the time in seconds for the full step of bundleadjustment, as a function of the number of images and points. Both axesare on a logarithmic scale.

FIG. 12 displays an example image from the “rover” sequence, withtracked points shown as black dots.

FIG. 13 is an oblique view of the camera motion recovered by theL₁-Wiberg bundle adjustment for the “rover” sequence.

FIG. 14 is a graph showing the time in seconds for the linearprogramming method in one step of projective bundle adjustment, as afunction of the number of images and points. Both axes are on alogarithmic scale.

DETAILED DESCRIPTION

One embodiment of the disclosed subject matter generalizes the Wibergapproach beyond factorization to minimize an arbitrary nonlinearfunction of two sets of variables ƒ(U, V). This approach allows Wibergminimization to be used for L₁ minimization, L₂ minimization, or maximumlikelihood estimation.

The general method disclosed herein operates by solving for Viteratively rather than in closed form. In another embodiment of thedisclosed subject matter, V can itself be split into two sets ofvariables found using Wiberg minimization because it is founditeratively. This embodiment of the disclosed subject matter results ina nested Wiberg minimization that can effectively minimize with respectto three sets of variables. Theoretically, n sets of variables may beminimized with more elaborate nesting, where n is a positive integer.This concept is discussed in detail below on L₁ projectivestructure-from-motion, where the three sets of unknowns are cameraprojection matrices, three-dimensional point positions, and projectivedepths. The approximate camera position may be determined by performinga nested Wiberg minimization. Also described below is a simplifiedpresentation of L₁-Wiberg factorization and a successive linearprogramming method for L₁ factorization that explicitly updates all ofthe unknowns simultaneously. The successive linear programming baselineoutperforms L₁-Wiberg for most large inputs, establishing a newstate-of-the-art for those cases. Comparisons with alternating methodswere excluded in favor of stronger baselines that minimize with respectto all of the variables simultaneously.

Embodiments of the presently disclosed subject matter may be implementedin and used with a variety of component and network architectures. FIG.1 is an example computer 20 suitable for implementing embodiments of thepresently disclosed subject matter. The computer 20 includes a bus 21which interconnects major components of the computer 20, such as acentral processor 24, a memory 27 (typically RAM, but which may alsoinclude ROM, flash RAM, or the like), an input/output controller 28, auser display 22, such as a display screen via a display adapter, a userinput interface 26, which may include one or more controllers andassociated user input devices such as a keyboard, mouse, and the like,and may be closely coupled to the I/O controller 28, fixed storage 23,such as a hard drive, flash storage, Fibre Channel network, SAN device,SCSI device, and the like, and a removable media component 25 operativeto control and receive an optical disk, flash drive, and the like.

The bus 21 allows data communication between the central processor 24and the memory 27, which may include read-only memory (ROM) or flashmemory (neither shown), and random access memory (RAM) (not shown), aspreviously noted. The RAM is generally the main memory into which theoperating system and application programs are loaded. The ROM or flashmemory can contain, among other code, the Basic Input-Output system(BIOS) which controls basic hardware operation such as the interactionwith peripheral components. Applications resident with the computer 20are generally stored on and accessed via a computer readable medium,such as a hard disk drive (e.g., fixed storage 23), an optical drive,floppy disk, or other storage medium 25.

The fixed storage 23 may be integral with the computer 20 or may beseparate and accessed through other interfaces. A network interface 29may provide a direct connection to a remote server via a telephone link,to the Internet via an internet service provider (ISP), or a directconnection to a remote server via a direct network link to the Internetvia a POP (point of presence) or other technique. The network interface29 may provide such connection using wireless techniques, includingdigital cellular telephone connection, Cellular Digital Packet Data(CDPD) connection, digital satellite data connection or the like. Forexample, the network interface 29 may allow the computer to communicatewith other computers via one or more local, wide-area, or othernetworks, as shown in FIG. 2.

Many other devices or components (not shown) may be connected in asimilar manner (e.g., document scanners, digital cameras and so on).Conversely, all of the components shown in FIG. 1 need not be present topractice the present disclosure. The components can be interconnected indifferent ways from that shown. The operation of a computer such as thatshown in FIG. 1 is readily known in the art and is not discussed indetail in this application. Code to implement the present disclosure canbe stored in computer-readable storage media such as one or more of thememory 27, fixed storage 23, removable media 25, or on a remote storagelocation.

FIG. 2 shows an example network arrangement according to an embodimentof the disclosed subject matter. One or more clients 10, 11, such aslocal computers, smart phones, tablet computing devices, and the likemay connect to other devices via one or more networks 7. The network maybe a local network, wide-area network, the Internet, or any othersuitable communication network or networks, and may be implemented onany suitable platform including wired and/or wireless networks. Theclients may communicate with one or more servers 13 and/or databases 15.The devices may be directly accessible by the clients 10, 11, or one ormore other devices may provide intermediary access such as where aserver 13 provides access to resources stored in a database 15. Theclients 10, 11 also may access remote platforms 17 or services providedby remote platforms 17 such as cloud computing arrangements andservices. The remote platform 17 may include one or more servers 13and/or databases 15.

More generally, various embodiments of the presently disclosed subjectmatter may include or be embodied in the form of computer-implementedprocesses and apparatuses for practicing those processes. Embodimentsalso may be embodied in the form of a computer program product havingcomputer program code containing instructions embodied in non-transitoryand/or tangible media, such as floppy diskettes, CD-ROMs, hard drives,USB (universal serial bus) drives, or any other machine readable storagemedium, wherein, when the computer program code is loaded into andexecuted by a computer, the computer becomes an apparatus for practicingembodiments of the disclosed subject matter. Embodiments also may beembodied in the form of computer program code, for example, whetherstored in a storage medium, loaded into and/or executed by a computer,or transmitted over some transmission medium, such as over electricalwiring or cabling, through fiber optics, or via electromagneticradiation, wherein when the computer program code is loaded into andexecuted by a computer, the computer becomes an apparatus for practicingembodiments of the disclosed subject matter. When implemented on ageneral-purpose microprocessor, the computer program code segmentsconfigure the microprocessor to create specific logic circuits. In someconfigurations, a set of computer-readable instructions stored on acomputer-readable storage medium may be implemented by a general-purposeprocessor, which may transform the general-purpose processor or a devicecontaining the general-purpose processor into a special-purpose deviceconfigured to implement or carry out the instructions. Embodiments maybe implemented using hardware that may include a processor, such as ageneral purpose microprocessor and/or an Application Specific IntegratedCircuit (ASIC) that embodies all or part of the techniques according toembodiments of the disclosed subject matter in hardware and/or firmware.The processor may be coupled to memory, such as RAM, ROM, flash memory,a hard disk or any other device capable of storing electronicinformation. The memory may store instructions adapted to be executed bythe processor to perform the techniques according to embodiments of thedisclosed subject matter.

One embodiment of the disclosed subject matter is an improvement ofprevious methods of L₁-Wiberg factorization, wherein solving for thecolumns of V separately may be performed rather than solving for V as awhole.

Linear programming solves the problem:

$\begin{matrix}{{\min\limits_{x}{c^{T}x}},{\ni {{Ax} \leq b}},{x \geq 0}} & ( {{Equation}\mspace{14mu} 1} )\end{matrix}$where x represents the vector of variables to be determined, c and b arevectors of known coefficients, and A is a known matrix of coefficients.As written here, the objective function (c^(T) x) is minimized. A x≦band x≧0 are the constraints over which the objective function is to beminimized. The problem may be solved using the simplex method. The“slack form” used by the simplex method converts the inequalities Ax≦bto equalities by introducing nonnegative slack variables s such that:

$\begin{matrix}{{\min\limits_{x}{c^{T}x}},{{\ni {\lbrack {A\mspace{14mu} I} \rbrack\lbrack {x;s} \rbrack}} = b},{x \geq 0},{s \geq 0}} & ( {{Equation}\mspace{14mu} 2} )\end{matrix}$

The optimal solution [x; s] will include basic components x_(B), whichcan be nonzero; and nonbasic components x_(N), which will be zero. Thecolumns of [A l] corresponding to x_(B) are the “basis” B and Bx_(B)=b.Then since x_(B)=B⁻¹b, it can be shown that:

$\begin{matrix}{\frac{\mathbb{d}x_{B}}{\mathbb{d}B} = {{- x_{B}^{T}} \otimes B^{- 1}}} & ( {{Equation}\mspace{14mu} 3} ) \\{\frac{\mathbb{d}x_{B}}{\mathbb{d}b} = B^{- 1}} & ( {{Equation}\mspace{14mu} 4} )\end{matrix}$where {circle around (x)} is the Kronecker product. By inserting zeroderivatives corresponding to elements of the nonbasic components anddropping derivatives of the slack variables, these derivatives to dx/dAand dx/db.

The L₁ residual of an overdetermined linear system may be minimizedusing linear programming. This may be represented by:

$\begin{matrix}{\min\limits_{y}{{d - {C\; y}}}_{1}} & ( {{Equation}\mspace{14mu} 5} )\end{matrix}$

Since the linear programming problem (Equation 1) requires x≧0, y may besplit into the difference of two nonnegative terms, y=y⁺−y⁻. Then,letting t_(i) be the L₁ residual of individual row i of d−Cy:t _(i) =|d _(i) −C _(i)(y ⁺ −y ⁻)|  (Equation 6)Converting Equation 6 to two inequalities results in:C _(i)(y ⁺ −y ⁻)−t _(i) ≦d _(i)  (Equation 7)−C _(i)(y ⁺ −y ⁻)−t _(i) ≦−d _(i)  (Equation. 8)The optimal y ⁺ , y ⁻ , t _(i) can be found with the linear program:

$\begin{matrix}{\min\limits_{y^{+},y^{-},t}{\lbrack {0\mspace{20mu} 0\mspace{20mu} 1^{T}} \rbrack\begin{bmatrix}y^{+} \\y^{-} \\t\end{bmatrix}}} & ( {{Equation}\mspace{14mu} 9} )\end{matrix}$and

$\begin{matrix}{{\begin{bmatrix}C & {- C} & {- I} \\{- C} & C & {- I}\end{bmatrix}\begin{bmatrix}y^{+} \\y^{-} \\t\end{bmatrix}} \leq \begin{bmatrix}d \\{- d}\end{bmatrix}} & ( {{Equation}.\mspace{14mu} 10} )\end{matrix}$

The objective in Equation 9 minimizes the sum of the individual L₁errors. The derivative of [y⁺ y⁻ t] with respect to the coefficientmatrix and right-hand side of Equation 10 may be obtained as describedabove. Since y is a simple function of y⁺, y⁻ and the coefficient matrixand right-hand-side are simple functions of C, d, dy/dC and dy/dd may bederived with some simple algebra and rearranging.

The L₁ norm of a nonlinear function may be minimized using the linearminimization described earlier iteratively. For example, supposingerrors between predictions ƒ(x) and observations y:error(x)=y−ƒ(x)  (Equation 11)and minimization of the following is desired:

$\begin{matrix}{\min\limits_{x}{{{error}(x)}}_{1}} & ( {{Equation}\mspace{14mu} 12} )\end{matrix}$Given an estimate x, a new estimate x+δ _(x) my be computed with:

$\begin{matrix}{\min\limits_{\delta_{x}}{{{{error}(x)} - {\frac{\mathbb{d}{f(x)}}{\mathbb{d}x}\delta_{x}}}}_{1}} & ( {{Equation}\mspace{14mu} 13} )\end{matrix}$and this may be repeated until convergence. Equation 13 represents alinear L₁ minimization and can be solved as described in Equation 5.

The step will often increase rather than decrease the objective,preventing convergence, because the δ_(x) that minimizes Equation 13 maybe outside the region where the linearization dƒ(x)/dx is accurate. Anadaptive trust region helps ensure convergence by limiting the step to afinite region near the current estimate, and adaptively determining whatthe size of that region should be. The trust region's role is similar tothe adaptive damping factor λ that Levenberg-Marquardt adds toGauss-Newton.

To limit the step size ∥δ_(x)∥₁ in Equation 13 to some trust region sizeμ, Equation 10 may be augmented to:

$\begin{matrix}{{\begin{bmatrix}C & {- C} & {- I} \\{- C} & C & {- I} \\I & I & 0\end{bmatrix}\begin{bmatrix}y^{+} \\y^{-} \\t\end{bmatrix}} \leq \begin{bmatrix}d \\{- d} \\\mu\end{bmatrix}} & ( {{Equation}\mspace{14mu} 14} )\end{matrix}$

μ may be adapted to a useful step size around the current estimate. Ifthe most recent δ_(x) decreases the objective, μ may be increased to 10μassuming that the best trust region is no smaller than the current μ. Ifδ_(x) does not decrease the objective, then δ_(x) is outside the regionwhere the linearization is valid. μ may be decreased to ∥δ_(x)∥₁/10 andthe objective may be recomputed.

FIG. 3 shows another embodiment of the disclosed subject matter, an L₁matrix factorization method. At 300, a set of observations may bereceived. The observations may comprise at least one known observationand at least one unknown observation. For example, if a matrix can beused to describe the numbers of flight reservations from source citiesto destination cites, with each row of the matrix representing a sourcecity and each column of the matrix representing a destination city. Someof the values in matrix may be received (“known observations”), e.g.,there are 43 reservations for a flight between Washington, D.C. and SanFrancisco Calif. Other values in the matrix may not be received(“unknown observations”) e.g., the number of reservations fromCharleston, S.C. to Rochester, N.Y. may be unknown or unavailable. Afirst low rank factor U and a second low rank factor V may be defined at310. An expression that describes the set of observations using low-rankfactorization with respect to the first low rank factor U and the secondlow rank factor V may be minimized at 320. At 330 this minimization maygenerate an error which may be reduced by minimizing each of the atleast one unknown observation with respect to U and V simultaneouslyusing successive linear programming. An estimate may be obtained of eachof the at least one unknown observation at 340 based upon the step ofminimizing the error. A more technical description of this method is nowprovided.

Supposing an observation matrix Y with observations Y_(r) ₁ _(,c) ₁ ,Y_(r) ₂ _(,c) ₂ , . . . , Y_(r) _(k) _(,c) _(k) present and otherobservations are missing. Then, low-rank matrix factorization minimizes:∥[Y _(r) ₁ _(,c) ₁ . . . Y _(r) _(k) _(,c) _(k) ]^(T) −[u _(r) ₁ ,v _(c)₁ . . . u _(r) _(k) ,v _(c) _(k) ]^(T)∥₁  (Equation 15)with respect to low-rank factors U and V, where u_(r) is the r^(th) rowof U and v_(c) is the c^(th) column of V. Eriksson et al. showed thatWiberg's approach could be used to minimize Equation 15 by combining thelinear and nonlinear L₁ minimizations for a single set of variables.

Low-rank factor U may be fixed and v_(c) of V may be solved for at eachcolumn by minimizing

$\begin{matrix}{{\lbrack {Y_{r_{i_{1}},c}\mspace{20mu}\ldots\mspace{20mu} Y_{r_{i_{n}},c}} \rbrack^{T} - {\lbrack {u_{r_{i_{1}}};\ldots\mspace{11mu};u_{r_{i_{n}}}} \rbrack v_{c}}}}_{1} & ( {{Equation}\mspace{14mu} 16} )\end{matrix}$with respect to v_(c). This is a linear minimization, meaning v_(c) anddv_(c)/dU can be found as described earlier.

Then, Equation 15 may be rewritten as a function of U only:∥[Y _(r) ₁ _(,c) ₁ . . . Y _(r) _(k) _(,c) _(k) ]^(T) −[u _(r) ₁ ,v _(c)₁ (U) . . . u _(r) _(k) ,v _(c) _(k) (U)]^(T)∥₁  (Equation 17)and it may be minimized with respect to U by computing the stepsdescribed in Equation 13. To perform this calculation, the errors y−ƒ(x)are required; these errors are the vector described by Equation 17 andthe derivative of the predictions with respect to U. In this case, theindividual predictions are u_(r) _(i) v_(c) _(i) (U) and theirderivatives are:

$\begin{matrix}{\frac{{\mathbb{d}u_{r_{i}}}{v_{c_{i}}(U)}}{\mathbb{d}U} = {\frac{{\partial u_{r_{i}}}{v_{c_{i}}(U)}}{\partial U} + {\frac{{\partial u_{r_{i}}}{v_{c_{i}}(U)}}{\partial v_{c_{i}}}\frac{\mathbb{d}v_{c_{i}}}{\mathbb{d}U}}}} & ( {{Equation}\mspace{14mu} 18} )\end{matrix}$where

$\begin{matrix}{\frac{{\partial u_{r_{i}}}{v_{c_{i}}(U)}}{\partial u_{r_{i}}} = v_{c_{i}}^{T}} & ( {{Equation}\mspace{14mu} 19} )\end{matrix}$and the partial derivative with respect to other components of U iszero; and

$\begin{matrix}{\frac{{\partial u_{r_{i}}}{v_{c_{i}}(U)}}{\partial v_{c_{i}}} = u_{r_{i}}} & ( {{Equation}\mspace{14mu} 20} )\end{matrix}$

Solving for the v_(c) independently produces the same results as solvingfor all of V in one linear program while greatly simplifying the method.

For L₁ factorization, minimization may be performed with respect to Uand V simultaneously using the successive linear programming methoddescribed above. This approach converges approximately as well as theWiberg approach and is much faster than the Wiberg method for most largeproblems.

The Wiberg approach effectively removes the unknowns V from theiterative minimization. However, L₁ Wiberg has two speed disadvantages.First, the L₁ errors t_(i) for each individual observation are added asan explicit unknown in the linear programming problem, and the number ofunknowns in V we remove can be insignificant compared to the number oft_(i)'s. Second, the Wiberg derivative matrix (Equation 18) can be moredense than the simultaneous derivative matrix (Equations 21, 22), whichslows the simplex solve.

This simultaneous method has the same objective (Equation 15) and errorfunction as the Wiberg approach. The derivatives of the predictions withrespect to U and V are required for the update step (Equation 13). Thederivatives of the prediction u_(r) _(i) v_(c) _(i) with respect tou_(r) _(i) and v_(c) _(i) are:

$\begin{matrix}{\frac{{\mathbb{d}u_{r_{i}}}v_{c_{i}}}{\mathbb{d}u_{r_{i}}} = v_{c_{i}}^{T}} & ( {{Equation}\mspace{14mu} 21} )\end{matrix}$and

$\begin{matrix}{\frac{{\mathbb{d}u_{r_{i}}}v_{c_{i}}}{\mathbb{d}v_{c_{i}}} = u_{r_{i}}} & ( {{Equation}\mspace{14mu} 22} )\end{matrix}$The other derivatives are zero.

One embodiment of the disclosed subject matter is a general Wibergminimization applied to arbitrary nonlinear functions of two set ofvariables. As an example of this idea, L₁ bundle adjustment may beimplemented as a general Wiberg minimization.

Turning now to general Wiberg minimization, L₁-Wiberg factorizationsolves for V and its derivative with respect to U using the closed-formlinear minimization (Equation 2), but solves for U using the iterativenonlinear minimization (Equation 3). Adapting the Wiberg method tominimize a nonlinear function of U is possible as long as the functionis linear in V. But many functions are nonlinear in two sets ofvariables. In bundle adjustment, for instance, the objective function isa sum of reprojection errors, which are nonlinear in both thethree-dimensional point positions and the six-degree-of-freedom camerapositions.

In one embodiment of the disclosed subject matter shown in FIG. 4,general Wiberg minimization may comprise defining a first function ƒ(U,V) that may be nonlinear in both a first set of variables U and a secondset of variables V at 400. The first function ƒ(U, V) may be transformedinto ƒ(U, V(U)) at 410 such that the members of the first set ofvariables U may be independent variables and the members of the secondset of variables V may be dependent on the first set of variables. At420, first assumed values of the first set of variables U may beassigned. The second set of variables V may be iteratively estimated at430 based upon the transformed first function ƒ(U, V(U)) and the assumedvalues of the first set of variables U such that ƒ(U, V(U)) may beminimized with respect to V. New estimates of the first set of variablesU may be iteratively computed to minimize the transformed first functionƒ(U, V(U)) based upon the assumed values of the first set of variablesU, the calculated values of the second set of variables V, a derivativeof the second set of variables V with respect to the first set ofvariables U, and a total derivative of the transformed first functionƒ(U, V(U)) with respect to the members of the first set of variables Uat 440. The derivative may be computed using finite differences. Thatis, the derivatives are estimated by incrementing the independentvariable, observing the resulting difference in the dependent variable,and using the ratio of dependent to independent variable change as anestimate of the derivative. The derivative may also be foundanalytically.

The minimizing of the transformed first function ƒ(U, V(U)) based uponthe assumed values of the first set of variables U may compriseminimizing L₁ error, minimizing L₂ error, or a maximum likelihoodestimation. The L₁ error may be minimized using successive linearprogramming. The L₂ error may be minimized using the Gauss-Newtonmethod. The maximum likelihood estimation may be determined by theNewton-Raphson method. For each iteration, a different subset of U maybe selected and the transformed first function ƒ(U, V(U)) may beminimized with respect to the selected subset of the variables. A subsetof U may be selected to be a smallest subset of U that creates asolvable subproblem. In addition, a nested minimization may be performedon any number of variables. For example, a set of variables U₁, . . . ,U_(n) may be defined. The first function ƒ(U, V) may be transformed intoƒ(U₁, V(U₁)), (U₂, V(U₁)), . . . , ƒ(U_(n), V(U_(n−1))), (U_(n+1),V(U_(n))) and minimized with respect to U_(n). The total derivative ofthe transformed first function may also be computed automatically suchthat the nesting may be performed with respect to n sets of variables.

More specifically, an iterative minimization for V as well as U may beemployed. With this approach, there is an outer loop that minimizes withrespect to U and within each U iteration there is an inner loop thatminimizes with respect to V. This method may be applied to bundleadjustment and factorization problems where given U, V breaks down intoindependent subproblems v_(c). In this case the time for iterativelysolving for the v_(c) is small because each v_(c) is much smaller thanU.

To determine dv_(c)/dU if v_(c) was found iteratively, v_(c) may besolved for by substituting v_(c) for x in the equation described inEquation 13. The final estimate for v_(c) is v_(c) ^(previous)+δ_(v)_(c) for some constant v_(c) ^(previous). The derivative of v_(c) withrespect to U is the derivative of δ_(v) _(c) :

$\begin{matrix}{\frac{\mathbb{d}v_{c}}{\mathbb{d}U} = \frac{\mathbb{d}\delta_{v_{c}}}{\mathbb{d}U}} & ( {{Equation}\mspace{14mu} 23} ) \\{= {\frac{\mathbb{d}\delta_{v_{c}}}{\mathbb{d}( {{\mathbb{d}{p( v_{c} )}}/{\mathbb{d}v_{c}}} )}\frac{\mathbb{d}( {{\mathbb{d}{p( v_{c} )}}/{\mathbb{d}v_{c}}} )}{\mathbb{d}U}}} & ( {{Equation}\mspace{14mu} 24} ) \\{{+ \frac{\mathbb{d}\delta_{v_{c}}}{\mathbb{d}{{error}( v_{c} )}}}\frac{\mathbb{d}{{error}( v_{c} )}}{\mathbb{d}U}} & ( {{Equation}\mspace{14mu} 25} )\end{matrix}$The derivatives of error (v_(c)) and dp(v_(c))/dv_(c) with respect to Udepend on the specific function that is being minimized.

The derivatives of the predictions with respect to U may be computed bygeneralizing Equation 18 to:

$\begin{matrix}{\frac{\mathbb{d}{p(U)}}{\mathbb{d}U} = {\frac{\partial{p(U)}}{\partial U} + {\frac{\partial{p(V)}}{\partial V}\frac{\mathbb{d}V}{\mathbb{d}U}}}} & ( {{Equation}\mspace{14mu} 26} )\end{matrix}$

Minimization may now be performed with respect to U by substituting thisderivative for dƒ(x)/dx in Equation 13.

If the inner iterations for v_(c) converge, the final steps δ_(v) _(c)will be zero. This means that in the linear programming solution forδ_(v) _(c) the simplex method can exclude elements of δ_(v) _(c) fromthe basis, and its derivatives with respect to U will be zero accordingto the methods described earlier concerning linear programming andlinear L₁ minimization. In this case, the method degenerates to anexpectation-maximization-like method in which the v_(c)'s areeffectively fixed during the U update. Expectation-maximization solvesfor two sets of unknowns alternately. It solves for each set of unknownswhile holding the other constant.

To ensure that v_(c) will be included in the basis, instead ofsubstituting error(v_(c)) and dp(v_(c))/dv_(c) for error (x) anddƒ(x)/dx directly in Equation 13:

$\begin{matrix}{\min\limits_{\delta_{v_{c}}}{{{{error}( v_{c} )} - {\frac{\mathbb{d}{p( v_{c} )}}{\mathbb{d}v_{c}}\delta_{v_{c}}}}}_{1}} & ( {{Equation}\mspace{14mu} 27} )\end{matrix}$the disclosed subject matter may solve for δ_(c) _(c) ′=δ_(v) _(c) +ε,ε=[10⁻⁶ . . . 10⁻⁶] in:

$\begin{matrix}{\min\limits_{\delta_{v_{c}}^{\prime}}{{( {{{{error}( v_{c} )} - \frac{\mathbb{d}{p( v_{c} )}}{\mathbb{d}v_{c}}} \in} ) - {\frac{\mathbb{d}{p( v_{c} )}}{\mathbb{d}v_{c}}\delta_{v_{c}}^{\prime}}}}_{1}} & ( {{Equation}\mspace{14mu} 28} )\end{matrix}$

δ_(v) _(c) ′ will be ε at convergence and included in the basis since itis nonzero. δ_(v) _(c) =δ_(v) _(c) ′−ε may be determined and derivativesof δ_(v) _(c) to be those of δ_(v) _(c) ′ may be taken. This methodallows for including v_(c) in the basis, although those skilled in theart will recognize that other approaches may be possible.

As described earlier, a trust region can be used to prevent divergenceof the successive linear programming iteration. In the general Wibergmethod disclosed herein, both the outer and inner iterations aresuccessive linear programming and the trust region can be used toprevent divergence in both cases.

Another embodiment of the disclosed subject matter comprises a L₁-Wibergbundle adjustment, shown in FIG. 5, where at least one camera feature orcalibration is known or no feature is known at 530. A plurality ofimages may be received from a camera at 500. The plurality of images maycomprise a sequence where the sequence may be, for example,chronologically determined or position- or location-based. One or moretwo-dimensional features may be identified in each of a plurality ofimages in the received sequence of images at 510. A three-dimensionalpoint may be received with each of the identified one or moretwo-dimensional features. Each of the one or more two-dimensionalfeatures may be tracked at 520 through successive images in theplurality of images. At 540 a two-dimensional image error may beiteratively minimized between the tracked each of the one or moretwo-dimensional features and an image reprojection with respect to thethree-dimensional point corresponding to the one or more two-dimensionalfeatures and a three-dimensional position of the camera corresponding toone or more of the plurality of images at 550. The camera may have noknown features or at least one known feature. Examples of camerafeatures may include focal length, image center, and radial distortion.The determination of the camera's position may be performed usinggeneral Wiberg minimization or nested Wiberg minimization. An L₁ norm ofthe image error may be minimized with respect to each of the at leastone known feature simultaneously using successive linear programming. AnL₁norm of the image error may be minimized with respect to each of theone or more unknowns simultaneously using successive linear programming.

First, a technical description of L₁-Wiberg bundle adjustment, where atleast one camera feature is known, will be provided. Given thetwo-dimensional observations of three-dimensional points in an imagecollection, bundle adjustment may estimate the three-dimensionalposition of the each point and the six-degree-of-freedom position(rotation and translation) of the camera for each image. The L₁-Wibergbundle adjustment disclosed herein handles nonlinearities (e.g.,perspective projection and rotations parameterized by Euler angles) andmissing observations easily.

Suppose an observation consists of point c_(i) at location x_(i) inimage r_(i), for i=1, . . . , n. Reprojections or predictions givenestimates of the camera and point positions may be determined by:p _(i)=π(R(ρ_(r) _(i) )X _(c) _(i) +t _(r) _(i) )  (Equation 29)

X_(c) _(i) is the three-dimensional point position. ρ_(r) _(i) , R(ρ_(r)_(i) ), and t_(r) _(i) are the rotation Euler angles, rotation matrix,and translation giving the six-degree-of-freedom camera position(specifically, the world-to-camera transformation) for image r_(i)respectively. π is the perspective projection. The error vector may bedescribed by:error(ρ,t,X)=[x ₁ −p ₁ . . . x _(n) −p _(n)]^(T)  (Equation 30)and minimized by computing:

$\begin{matrix}{\min\limits_{\rho,t,X}{{{error}( {\rho,t,X} )}}_{1}} & ( {{Equation}\mspace{14mu} 31} )\end{matrix}$

To implement bundle adjustment as a general Wiberg minimization, aminimization with respect to the cameras in the outer loop and theindividual points X_(c) _(i) in inner loops may be determined.

As with matrix factorization, all of the unknowns may be estimatedsimultaneously using iterative L₁ minimization directly. The method ofgeneral Wiberg minimization disclosed herein operates by solving for Viteratively rather than in closed form. In another embodiment, because Vis found iteratively, it may be split into two sets of variables foundusing the Wiberg approach. This results in a nested Wiberg minimizationthat can effectively minimize with respect to three sets of variables.Those skilled in the art will recognize that theoretically n sets ofvariables may be minimized with further nesting.

In general, a nested Wiberg minimization may be performed by having asecond set of variables V defined in terms of a third set of variables Dand a second function ƒ(V, D) to be minimized may be defined that isnonlinear in both a second set of variables V and the third set ofvariables D 600. The second function ƒ(V, D) may be transformed at 610into ƒ(V, D(V)) such that the members of the second set of variables Vmay be independent variables and the members of the third set ofvariables D may be dependent on the second set of variables V and thatthe first function is now represented by ƒ(U, V(U), D(V, U)). Secondassumed values of the second set of variables V may be assigned at 620.The third set of variables D may be iteratively estimated at 630 basedupon the transformed second function ƒ(V, D(V)) and the second assumedvalues of the second set of variables V, such that ƒ(V, D(V)) may beminimized with respect to D. At 640 new estimates of the second set ofvariables V may be iteratively computed to minimize the transformedsecond function ƒ(V, D(V)) based upon the assumed values of the secondset of variables V, the calculated values of the third set of variablesD, a derivative of the third set of variables D with respect to thesecond set of variables V, and a total derivative of the transformedsecond function ƒ(V, D(V)) with respect to the members of the second setof variables V. At 650 new estimates of the first set of variables U maybe iteratively computed to maximize the transformed first function ƒ(U,V(U), D(V, U)) based upon the assumed values of the first set ofvariables U, the calculated values of the second set of variables V, thecalculated values of the third set of variables D, the derivative of Dwith respect to V, the derivative of D with respect to U, and thederivative of V with respect to U, and the total derivative of thetransformed first function ƒ(U, V(U), D(V, U)) with respect to the firstset of variables U.

For example, suppose there are three sets of variables U, V, and D andthat given U and V a minimization is performed with respect to D inclosed form. Further, given U, a minimization is performed with respectto V in an inner iteration and with respect to U in an outer iteration.The total derivative of predictions p with respect to U may be utilizedto minimize with respect to U using the nonlinear L₁ minimization bycomputing:

$\begin{matrix}{\frac{\mathbb{d}p}{\mathbb{d}U} = {\frac{\partial p}{\partial U} + {( {\frac{\partial p}{\partial V} + {\frac{\partial p}{\partial D}\frac{\mathbb{d}D}{\mathbb{d}V}}} )\frac{\mathbb{d}V}{\mathbb{d}U}} + {\frac{\partial P}{\partial D}\frac{\mathbb{d}D}{\mathbb{d}U}}}} & ( {{Equation}\mspace{14mu} 32} )\end{matrix}$

Equation 26 is the total derivative of p with respect to U, with V afunction of U. Similarly, Equation 32 is the total derivative of p withrespect to U, but with both V and D a function of U. The expression inparentheses, which is the total derivative of p with respect to V, withD a function of V reflects the nesting.

The L₁-Wiberg bundle adjustment disclosed above applies when the camerafeatures or calibration (focal length, image center, radial distortion)is known. In another embodiment, a projective bundle adjustment may beused to recover structure and motion from uncalibrated image sequencesas shown in FIG. 5. A projective reconstruction may include 3×4 cameraprojection matrices and 4-dimensional projective points that areconsistent with the image observations and are known up to a common 4×4transformation. This transformation may be identified, and theprojective reconstruction upgraded to a Euclidean reconstruction, givensome knowledge of the scene geometry or camera intrinsics.

For example, if point c₁ at location x₁=[x y 1]^(T) in image r_(i), fori=1, . . . , n is observed, then the reprojections (predictions) givenestimates of the projection matrices and points may be expressed as:p _(i) =d _(r) _(i) _(,c) _(i) C _(r) _(i) X _(c) _(i)   (Equation 33)where p_(i) is a 3-dimensional projective vector and d_(r) _(i) _(,c)_(i) is the inverse projective depth for observation p_(i), a scalar.The error may be determined by:error(C,X,d)=[x ₁ −p ₁ . . . x _(n) −p _(n)]^(T)  (Equation 34)and the minimization may be determined by:

$\begin{matrix}{\min\limits_{C,X,d}{{{error}( {C,X,d} )}}_{1}} & ( {{Equation}\mspace{14mu} 35} )\end{matrix}$

The unknowns may be estimated using nested Wiberg minimization at 560.In this example, each inverse depth may be determined independentlygiven point and projection matrix estimates, in closed form. Eachprojection matrix given point estimates may be determined using an innerWiberg minimization, letting the inverse depths vary implicitly. Thepoints in an outer Wiberg minimization may be determined by allowing theprojection matrices and inverse depths vary implicitly.

Given point and projection matrix estimates, a “read off” of theprojective depths as the last element of CX may be performed rather thanestimating them as is known in the art. The inverse depths areexplicitly estimated as an example of a nested Wiberg minimization.

As with factorization and Euclidean bundle adjustment, projective bundleadjustment may also be performed by minimizing with respect to all ofthe unknowns simultaneously, using the nonlinear L₁ minimizationdescribed earlier 570.

FIGS. 7-9 were generated to provide a comparison of the method ofL₁-Wiberg factorization used previously and the method disclosed herein.Following previously a described experiment, a 100 7×12 randommeasurement matrices Y were generated, with elements drawn from [−1,1].20% of the elements were marked as missing. 10% of Y's entries wereselected as outliers and added noise drawn from [−5,5] to them. U and Vwere initialized randomly with elements in [−1,1], and both the Wibergand simultaneous method were applied to them. For each trial, the Wibergand simultaneous method obtain the exact same input and have the sameinitial residual. Both methods were executed for 400 iterations which ismore than is necessary for convergence.

FIG. 7 shows a histogram of the final residual for each method. As shownin the figure, the Wiberg method tended to produce lower finalresiduals, though the residuals from the two methods were comparable.Out of the 100 trials, the final residual achieved by Wiberg was betterthan the simultaneous method by 5% in 57 trials. The simultaneous methodwas better than Wiberg by 5% in 17 trials. The residuals for bothmethods were within 5% in the remaining 26 trials.

Each method was executed for 400 iterations and the number of stepsuntil convergence is shown in FIG. 8. For the purposes of this graph,the method converged when it was within a negligible difference (0.1)versus final residual achieved after the 400 iterations. The Wibergmethod required fewer iterations, converging in 20 or fewer iterationsin over half of the trials. The Wiberg method converged in fewer stepsin 83 trials compared to 16 trials for the simultaneous method. Bothmethod converged equally well in 1 trial.

In these small scale examples, the linear programming solve time for themain update step was 1.1 ms for the Wiberg step and 3.1 ms for thesimultaneous method step. However, for larger problems, the linearprogramming time increases quickly, as shown in FIG. 9. This figureplots the time for the linear programming solve for one U update, fordifferent problem sizes, for both the Wiberg and simultaneousapproaches. The horizontal axis gives the number of columns in V and thevertical axis gives the time in seconds. The separate plots give thetimes for different numbers of rows in U. Both axes are on a logarithmicscale. The graph in FIG. 9 shows that Wiberg is faster than simultaneousif U has few rows, and simultaneous is faster otherwise.

Next, the Wiberg approach is compared to the simultaneous bundleadjustment for the ability to recover structure and motion from a realimage sequence. For example, structure from motion may be obtained froma synthesized data set comprising three-dimensional points uniformlydistributed on a sphere and the camera moving in a ring around it,looking toward the center in each image. 300 instances of this problemwere created, varying the following: the number of images (2, 3, 5, 7,11, 18); the number of points (10, 32, 100, 316, 1000); and the error inthe initial estimates of the camera rotation Euler angles, cameratranslation, and three-dimensional point positions. The errors in theinitial estimates were drawn from [−ε, ε], for ε in {0.01, 0.0215,0.0464, 0.1, 0.215, 0.464, 1, 2.154, 4.641, 10}.

In almost all cases the two methods converged to zero error, despite thehigh errors in the initial estimates, within 25 iterations and oftenjust a few iterations. The major difference was in the time required perstep, which is summarized in FIGS. 10 and 11. For small numbers ofimages, the Wiberg method is much faster than the simultaneous method.For large numbers of images, the simultaneous method is faster than theWiberg method. The crossover is between 4 and 5 images.

FIG. 12 shows an example image from the “rover” sequence, with trackedpoints shown as black dots. The camera looks out from the back of arover while the rover executes three large loops. The sequence includesabout 700 images and about 10,000 three-dimensional points. The Wibergbundle adjustment correctly recovers the structure and motion, and theestimated motion is shown in FIG. 13. The result is locally correct andconsistent with the global motion estimates from GPS and odometry.Factorization and affine structure-from-motion cannot recover the cameraposition because perspective effects are extremely strong due to thelarge difference in distance to the near and far points.

The convergence and speed of the Wiberg and simultaneous projectivebundle adjustment methods were compared using a synthetic data setcomprising 300 instances of the problem, varying the number of images(10, 14, 19, 26, 36, 50); the number of points (10, 15, 22, 33, 50); andthe error in the initial estimates (10 values exponentially spacedbetween 0.001 and 0.1).

FIG. 14 is analogous to FIGS. 9 and 10, and shows the time per outerlinear programing solve as a function of the problem size. Pointpositions were determined rather than camera positions in the outerloop. The observed trend was the same as described earlier. The Wibergstep was faster for smaller numbers of points and the simultaneousmethod was faster for larger numbers of points. The crossover was around22 points.

The Wiberg method had somewhat better convergence than the simultaneousmethod when started from the same estimates. Out of the 300 problems,both converged to zero residual in 232 cases (those with the smallererrors). For the larger errors, the Wiberg method converged to a smallerresidual in 57 cases and the simultaneous method converged to a smallerresidual in 2 cases.

The foregoing description, for purpose of explanation, has beendescribed with reference to specific embodiments. However, theillustrative discussions above are not intended to be exhaustive or tolimit embodiments of the disclosed subject matter to the precise formsdisclosed. Many modifications and variations are possible in view of theabove teachings. The embodiments were chosen and described in order toexplain the principles of embodiments of the disclosed subject matterand their practical applications, to thereby enable others skilled inthe art to utilize those embodiments as well as various embodiments withvarious modifications as may be suited to the particular usecontemplated.

The invention claimed is:
 1. A computer-implemented method fordetermining a camera position in a plurality of images captures by thecamera comprising: defining a first function ƒ(U, V) that is nonlinearin both a first set of variables U and a second set of variables V,wherein the first set of variables U correspond to a first feature ofthe plurality of images and the second set of variables V correspond toa second features of the plurality of images; transforming, by aprocessor, the first function ƒ(U, V) into ƒ(U, V(U)) such that themembers of the first set of variables U are independent variables andthe members of the second set of variables V are dependent on the firstset of variables; assigning first assumed values of the first set ofvariables U; iteratively estimating, by the processor, the second set ofvariables V based upon the transformed first function ƒ(U, V(U)) and theassumed values of the first set of variables U such that ƒ(U, V(U)) isminimized with respect to V; and determining the camera position byiteratively computing new estimates of the first set of variables U tominimize the transformed first function ƒ(U, V(U)) based upon theassumed values of the first set of variables U and the calculated valuesof the second set of variables V and a derivative of the second set ofvariables V with respect to the first set of variables U and a totalderivative of the transformed first function ƒ(U, V(U)) with respect tothe members of the first set of variables U.
 2. The method of claim 1,further comprising: defining a set of variables U₁, . . . , U_(n);transforming the first function ƒ(U, V) into ƒ(U₁, V(U₁)), (U₂, V(U₁)),. . . , ƒ(U_(n), V(U_(n−1))), (U_(n+1), V(U_(n))); and minimizing withrespect to U_(n).
 3. The method of claim 1, wherein the minimizing thetransformed first function ƒ(U, V(U)) based upon the assumed values ofthe first set of variables U comprises minimizing L₁ error.
 4. Themethod of claim 3, wherein the L₁ error is minimized using successivelinear programming.
 5. The method of claim 1, wherein the minimizing thetransformed first function ƒ(U, V(U)) based upon the assumed values ofthe first set of variables U comprises minimizing L₂ error.
 6. Themethod of claim 5, wherein the L₂ error is minimized using theGauss-Newton method.
 7. The method of claim 1, wherein the minimizingthe transformed first function ƒ(U, V(U)) based upon the assumed valuesof the first set of variables U comprises a maximum likelihoodestimation.
 8. The method of claim 7, wherein the maximum likelihoodestimation is determined by the Newton-Raphson method.
 9. The method ofclaim 1, further comprising: defining the second set of variables V interms of a third set of variables D; defining a second function ƒ(V, D)to be minimized, that is nonlinear in both a second set of variables Vand the third set of variables D; transforming the second function ƒ(V,D) into ƒ(V, D(V)) such that the members of the second set of variablesV are independent variables and the members of the third set ofvariables D are dependent on the second set of variables V and that thefirst function is now represented by ƒ(U, V(U), D(V, U)); assigningsecond assumed values of the second set of variables V; iterativelyestimating the third set of variables D based upon the transformedsecond function ƒ(V, D(V)) and the second assumed values of the secondset of variables V, such that ƒ(V, D(V)) is minimized with respect to D;iteratively computing new estimates of the second set of variables V tominimize the transformed second function ƒ(V, D(V)) based upon theassumed values of the second set of variables V and the calculatedvalues of the third set of variables D and a derivative of the third setof variables D with respect to the second set of variables V and a totalderivative of the transformed second function ƒ(V, D(V)) with respect tothe members of the second set of variables V; and iteratively computingnew estimates of the first set of variables U to maximize thetransformed first function ƒ(U, V(U), D(V, U)) based upon the assumedvalues of the first set of variables U and the calculated values of thesecond set of variables V and the calculated values of the third set ofvariables D, the derivative of D with respect to V, the derivative of Dwith respect to U, and the derivative of V with respect to U, and thetotal derivative of the transformed first function ƒ(U, V(U), D(V, U))with respect to the first set of variables U.
 10. The method of claim 1,wherein ƒ(U, V(U)) is minimized with respect to a subset of thevariables in U on each iteration, selecting a different subset of U oneach iteration.
 11. The method of claim 1, wherein the derivatives arecomputed using finite differences.